TITLE : Explorer & summarize data of White Wine Quality

Citation Request:
This dataset is public available for research. The details are described in [Cortez et al., 2009].
Please include this citation if you plan to use this database:

P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.

Available at: [@Elsevier] http://dx.doi.org/10.1016/j.dss.2009.05.016 [Pre-press (pdf)] http://www3.dsi.uminho.pt/pcortez/winequality09.pdf [bib] http://www3.dsi.uminho.pt/pcortez/dss09.bib

Relevant Information:
The datasets are related to white variants of the Portuguese “Vinho Verde” wine. For more details, consult: http://www.vinhoverde.pt/en/ or the reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.). These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are munch more normal wines than excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent or poor wines. Also, we are not sure if all input variables are relevant. So it could be interesting to test feature selection methods.

Attribute information:
For more information, read [Cortez et al., 2009].
Input variables (based on physicochemical tests):
1 - fixed acidity (tartaric acid - g / dm^3)
2 - volatile acidity (acetic acid - g / dm^3)
3 - citric acid (g / dm^3)
4 - residual sugar (g / dm^3)
5 - chlorides (sodium chloride - g / dm^3
6 - free sulfur dioxide (mg / dm^3)
7 - total sulfur dioxide (mg / dm^3)
8 - density (g / cm^3)
9 - pH
10 - sulphates (potassium sulphate - g / dm3)
11 - alcohol (% by volume)
Output variable (based on sensory data): 12 - quality (score between 0 and 10)

Description of attributes:
1 - fixed acidity: most acids involved with wine or fixed or nonvolatile (do not evaporate readily).
2 - volatile acidity: the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste.
3 - citric acid: found in small quantities, citric acid can add ‘freshness’ and flavor to wines.
4 - residual sugar: the amount of sugar remaining after fermentation stops, it’s rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet.
5 - chlorides: the amount of salt in the wine.
6 - free sulfur dioxide: the free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine.
7 - total sulfur dioxide: amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine.
8 - density: the density of water is close to that of water depending on the percent alcohol and sugar content.
9 - pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale.
10 - sulphates: a wine additive which can contribute to sulfur dioxide gas (S02) levels, wich acts as an antimicrobial and antioxidant.
11 - alcohol: the percent alcohol content of the wine Output variable (based on sensory data):
12 - quality (score between 0 and 10)

Load white wine data

##   X fixed.acidity volatile.acidity citric.acid residual.sugar chlorides
## 1 1           7.0             0.27        0.36           20.7     0.045
## 2 2           6.3             0.30        0.34            1.6     0.049
## 3 3           8.1             0.28        0.40            6.9     0.050
## 4 4           7.2             0.23        0.32            8.5     0.058
## 5 5           7.2             0.23        0.32            8.5     0.058
## 6 6           8.1             0.28        0.40            6.9     0.050
##   free.sulfur.dioxide total.sulfur.dioxide density   pH sulphates alcohol
## 1                  45                  170  1.0010 3.00      0.45     8.8
## 2                  14                  132  0.9940 3.30      0.49     9.5
## 3                  30                   97  0.9951 3.26      0.44    10.1
## 4                  47                  186  0.9956 3.19      0.40     9.9
## 5                  47                  186  0.9956 3.19      0.40     9.9
## 6                  30                   97  0.9951 3.26      0.44    10.1
##   quality
## 1       6
## 2       6
## 3       6
## 4       6
## 5       6
## 6       6
In above we represent the top rows of dataframe/dataset by using head function
##         X fixed.acidity volatile.acidity citric.acid residual.sugar
## 4893 4893           6.5             0.23        0.38            1.3
## 4894 4894           6.2             0.21        0.29            1.6
## 4895 4895           6.6             0.32        0.36            8.0
## 4896 4896           6.5             0.24        0.19            1.2
## 4897 4897           5.5             0.29        0.30            1.1
## 4898 4898           6.0             0.21        0.38            0.8
##      chlorides free.sulfur.dioxide total.sulfur.dioxide density   pH
## 4893     0.032                  29                  112 0.99298 3.29
## 4894     0.039                  24                   92 0.99114 3.27
## 4895     0.047                  57                  168 0.99490 3.15
## 4896     0.041                  30                  111 0.99254 2.99
## 4897     0.022                  20                  110 0.98869 3.34
## 4898     0.020                  22                   98 0.98941 3.26
##      sulphates alcohol quality
## 4893      0.54     9.7       5
## 4894      0.50    11.2       6
## 4895      0.46     9.6       5
## 4896      0.46     9.4       6
## 4897      0.38    12.8       7
## 4898      0.32    11.8       6
In above we represent the bottom rows of dataframe/dataset by using tail function

Dimensions of Data

Let’s quicky check the dimensions of our data, i.e., columns and rows.
## [1] 4898   13
White Wine data set have 4898 rows and 13 column in which we do further analysis.

Features of Data

We will take a quick glance over the feature names of White Wine datasets.
##  [1] "X"                    "fixed.acidity"        "volatile.acidity"    
##  [4] "citric.acid"          "residual.sugar"       "chlorides"           
##  [7] "free.sulfur.dioxide"  "total.sulfur.dioxide" "density"             
## [10] "pH"                   "sulphates"            "alcohol"             
## [13] "quality"

Structure of Data

By using str() function we know the short summary of all the features present in a dataframe.
## 'data.frame':    4898 obs. of  13 variables:
##  $ X                   : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ fixed.acidity       : num  7 6.3 8.1 7.2 7.2 8.1 6.2 7 6.3 8.1 ...
##  $ volatile.acidity    : num  0.27 0.3 0.28 0.23 0.23 0.28 0.32 0.27 0.3 0.22 ...
##  $ citric.acid         : num  0.36 0.34 0.4 0.32 0.32 0.4 0.16 0.36 0.34 0.43 ...
##  $ residual.sugar      : num  20.7 1.6 6.9 8.5 8.5 6.9 7 20.7 1.6 1.5 ...
##  $ chlorides           : num  0.045 0.049 0.05 0.058 0.058 0.05 0.045 0.045 0.049 0.044 ...
##  $ free.sulfur.dioxide : num  45 14 30 47 47 30 30 45 14 28 ...
##  $ total.sulfur.dioxide: num  170 132 97 186 186 97 136 170 132 129 ...
##  $ density             : num  1.001 0.994 0.995 0.996 0.996 ...
##  $ pH                  : num  3 3.3 3.26 3.19 3.19 3.26 3.18 3 3.3 3.22 ...
##  $ sulphates           : num  0.45 0.49 0.44 0.4 0.4 0.44 0.47 0.45 0.49 0.45 ...
##  $ alcohol             : num  8.8 9.5 10.1 9.9 9.9 10.1 9.6 8.8 9.5 11 ...
##  $ quality             : int  6 6 6 6 6 6 6 6 6 6 ...
As we can saw, there are 13 numeric variables.

Missing Value of Data

## [1] 0
‘0’ missing values! It means dataset don’t have any missing data in the feature.
Introduction:In white wine data set have 4898 rows and 13 column in whic we do further analysis.In dataset there is no missing or NA value.In this project we do Univariates,Bivariates and multivariates Analysis to explore our project with plotting different types of graphs.In which we Analysis what factor make a good white wine and what makes is worst.

Univariate Plots Section

In this section,we visualize single variable of dataset for important factors of white wine quality by histogram and boxplot looking to the distributions of the white wine dataset. For this I am going to visualize the histograms of the different variables of the file in order to check their distributions.

For white wine ‘Quality’ feature:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.000   5.000   6.000   5.878   6.000   9.000
## 
##    3    4    5    6    7    8    9 
##   20  163 1457 2198  880  175    5
## 
##    Low Medium   High 
##    183   3655   1060
## wine_df$quality.rate: Low
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.000   4.000   4.000   3.891   4.000   4.000 
## -------------------------------------------------------- 
## wine_df$quality.rate: Medium
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   5.000   5.000   6.000   5.601   6.000   6.000 
## -------------------------------------------------------- 
## wine_df$quality.rate: High
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.000   7.000   7.000   7.175   7.000   9.000
Observation : we find that white wine have mostly medium quality of wines.Mostly wines are in quality of 5 & 6 . Very few wines are in quality of 8 & 9. there are only 5 Maximum quality of wines are 9 rating and only 20 minimum quality of wines are 3 ration.

Observation : As we observe for white wine quality almost normal distributed.

In white wine level of ‘Alcohol’ feature:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.00    9.50   10.40   10.51   11.40   14.20

Observation : In this plot level of Alcohol show right positively skewed.
              9.5% Alcohol shows higher number of count in white wine.

For white wine amount of ‘Residual Sugar’:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.600   1.700   5.200   6.391   9.900  65.800

Observation : The above graph shows the bimodality skewed distribution of residual sugar.In this graph the outliner clearlly shown on 65.800(g/dm^3) and the highest peek of the graph shown between 1-2(g/dm^3).

For white wine ‘Acidity’ feature:

## [1] "Summary of fixed.acidity"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.800   6.300   6.800   6.855   7.300  14.200
## [1] "Summary of volatile.acidity"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0800  0.2100  0.2600  0.2782  0.3200  1.1000
## [1] "Summary of citric.acid"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.2700  0.3200  0.3342  0.3900  1.6600

Observation :Acid make wine taste sour and if acid is low wine loose their taste As we observe that fixed acidity** have slightly normally distributed and outlier of 14.2 g/dm^3 shown clearly in the graph.The median of fixed acidity in the white wines are 6.800 g/dm^3.

In volatile acidity is also Normally distributed and outliers(1.1000 g/dm^3) on the higher end of the scale are visible.The median value is 0.2600 g/dm^3.
In white wines we find most have 0 g/dm^3 of citric acid.we find, the graph is positively right skewed with outlier 1.6600 g/dm^3 and median of citric acid is .3200 g/dm^3.In white wine dateset only citric acid shows zero value in very small percentage of dataset.

summary of remaining variabls

## [1] "Summary of chlorides"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00900 0.03600 0.04300 0.04577 0.05000 0.34600
## [1] "Summary of free.sulfur.dioxide"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2.00   23.00   34.00   35.31   46.00  289.00
## [1] "Summary of total.sulfur.dioxide"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     9.0   108.0   134.0   138.4   167.0   440.0
## [1] "Summary of density"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.9871  0.9917  0.9937  0.9940  0.9961  1.0390
## [1] "Summary of pH"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.720   3.090   3.180   3.188   3.280   3.820
## [1] "Summary of sulphates"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.2200  0.4100  0.4700  0.4898  0.5500  1.0800
Observation : we observe that the pH level of wine in between 3~4 pH level so that white wine show 3.180 median pH level.Chlorides(sodium chloride) provide the saltiness to the wine with median value of 0.04300 g/dm^3.Sulfur dioxide (SO2) is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine median of free SO2 is 34.00 mg/dm^3 and total SO2 is 134.00 mg/dm^3.Median value of density is 0.9937 g/cm^3 depends on residual sugar and alcohol quantity.

To create histograms for each remaining variables

Observation : AS we observe, mostly remaing graph have noraml distribution but sulphates & density have bimodal distribution and we can see that all the variables have outliers.

Removing outliner from unvariant plot.

Observation : As we clear seen in the above grabh that outliner is remove and graph looks easily understable.All Residual sugar,citric acid have positivelyright-skewed distribution.pH,total surfur dioxide,free sulfur dioxide,fixed acidity and volitile acidity have nomal distributionand alcohol,sulphates,density,chlorides have bimodel distribution.

Univariate Analysis

Summary of White wine dataset

## [1] 4898   14
##        X        fixed.acidity    volatile.acidity  citric.acid    
##  Min.   :   1   Min.   : 3.800   Min.   :0.0800   Min.   :0.0000  
##  1st Qu.:1225   1st Qu.: 6.300   1st Qu.:0.2100   1st Qu.:0.2700  
##  Median :2450   Median : 6.800   Median :0.2600   Median :0.3200  
##  Mean   :2450   Mean   : 6.855   Mean   :0.2782   Mean   :0.3342  
##  3rd Qu.:3674   3rd Qu.: 7.300   3rd Qu.:0.3200   3rd Qu.:0.3900  
##  Max.   :4898   Max.   :14.200   Max.   :1.1000   Max.   :1.6600  
##  residual.sugar     chlorides       free.sulfur.dioxide
##  Min.   : 0.600   Min.   :0.00900   Min.   :  2.00     
##  1st Qu.: 1.700   1st Qu.:0.03600   1st Qu.: 23.00     
##  Median : 5.200   Median :0.04300   Median : 34.00     
##  Mean   : 6.391   Mean   :0.04577   Mean   : 35.31     
##  3rd Qu.: 9.900   3rd Qu.:0.05000   3rd Qu.: 46.00     
##  Max.   :65.800   Max.   :0.34600   Max.   :289.00     
##  total.sulfur.dioxide    density             pH          sulphates     
##  Min.   :  9.0        Min.   :0.9871   Min.   :2.720   Min.   :0.2200  
##  1st Qu.:108.0        1st Qu.:0.9917   1st Qu.:3.090   1st Qu.:0.4100  
##  Median :134.0        Median :0.9937   Median :3.180   Median :0.4700  
##  Mean   :138.4        Mean   :0.9940   Mean   :3.188   Mean   :0.4898  
##  3rd Qu.:167.0        3rd Qu.:0.9961   3rd Qu.:3.280   3rd Qu.:0.5500  
##  Max.   :440.0        Max.   :1.0390   Max.   :3.820   Max.   :1.0800  
##     alcohol         quality      quality.rate 
##  Min.   : 8.00   Min.   :3.000   Low   : 183  
##  1st Qu.: 9.50   1st Qu.:5.000   Medium:3655  
##  Median :10.40   Median :6.000   High  :1060  
##  Mean   :10.51   Mean   :5.878                
##  3rd Qu.:11.40   3rd Qu.:6.000                
##  Max.   :14.20   Max.   :9.000

What is the structure of your dataset?

There are 4989 rows in white wine dataset with 13 variables. So we have 13 variable in which 11 variables are quantitative(such as fixed.acidity, volatile.acidity, citric.acid, residual.sugar, chlorides, free.sulfur.dioxide, total.sulfur.dioxide, density, pH, sulphates, alcohol) and 01 variable is qualitative(“quality”) and 01 are indexing(“X”).
Other observations:

01.In white wine ‘quality’ variable median and mean quantity value is 6 and 5.878.quality score in between 3 to 9.low score is 3 and high score 9 and most quality have 5 & 6.
02.As we seen fixed acidity is slightly high with minimum 3.8 and maximum 14.2, while volatile acidity ranges between 0.08 to 1.1, similar with citric acid with range between 0 to 1.66 g/dm^3.
03.Most pH values in between 3 and 3.30 pH level.
04.All of the features have a minimum value greater than 0 except for citric acid.
05.BothFixed acidity and residual sugar have the highest medians of any of the variables measured in g/dm^3.
06.The alcohol content in white wine from 8.00 to 14.20 percentile.
07. when we create a new variable ‘quality.rate’ we have total 14 varaibales

What is/are the main feature(s) of interest in your dataset?

Quality is the main feature of interest in white wine dataset.Further we analysis and correlated quality with other variables in the dataset.

What other features in the dataset do you think will help support your into your feature(s) of interest?

Alcohol, residual sugar and pH level play important role.

Density,fixed acidity,volatile.acidity & citric acid also provide different view of analysis.Also it great to see how they coordinate and provide Bivariate and Multivariate Analysis by using diffrent variables of data set.

Did you create any new variables from existing variables in the dataset?

We create a new variable for quality rating name “quality.rate” for existing varaiables in the dataset.In ‘quality.rate’ divide into three raiting Low(3,4),Median(5,6),& High(7,8,9).

Of the features you investigated, were there any unusual distributions? you perform any operations on the data to tidy, adjust, or change the form the data? If so, why did you do this?

In given white wine dataset no missing values,not find any unusual distribution.No we can’t perform any operations on the data to tidy,adjust or change the form.But we find or observe outliners in histogram we don’t remove it permanetly in the dataset but we perform analysis over it above section how our graph look like when we remove outliners from histogram.

Bivariate Plots Section

Draw & Analyis Correlation plot of white wine dataset.

Draw Heatmap for further correlationship with varibles.

Observation : As we observe both Graph we find that Residual sugar and density, free sulfur dioxide and density, total sulfur dioxide and density have positive correlationship.density and alcohol, total sulfur dioxide and alcohol, residual suhar and alcohol have negative correlationship.The density and alcohol , total & free sulfur dioxide have strong relatioship.

Correlation between Quality and other variables.

##                              [,1]
## fixed.acidity        -0.113662831
## volatile.acidity     -0.194722969
## citric.acid          -0.009209091
## residual.sugar       -0.097576829
## chlorides            -0.209934411
## free.sulfur.dioxide   0.008158067
## total.sulfur.dioxide -0.174737218
## density              -0.307123313
## pH                    0.099427246
## sulphates             0.053677877
## alcohol               0.435574715
Observation : As we seen above citric.acid,free.sulfur.dioxide,pH,Sulphates & Alcohol have positive and effective correlation with Quality of wine.

Creating Boxplot and scatterplot function for further Bivariate plotting and Analysis

Quality Vs Residual Sugar

## wine_df$quality: 3
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.700   1.587   4.600   6.393  10.700  16.200 
## -------------------------------------------------------- 
## wine_df$quality: 4
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.700   1.300   2.500   4.628   7.100  17.550 
## -------------------------------------------------------- 
## wine_df$quality: 5
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.600   1.800   7.000   7.335  11.500  23.500 
## -------------------------------------------------------- 
## wine_df$quality: 6
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.700   1.700   5.300   6.442   9.900  65.800 
## -------------------------------------------------------- 
## wine_df$quality: 7
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.900   1.700   3.650   5.186   7.325  19.250 
## -------------------------------------------------------- 
## wine_df$quality: 8
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.800   2.100   4.300   5.671   8.200  14.800 
## -------------------------------------------------------- 
## wine_df$quality: 9
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.60    2.00    2.20    4.12    4.20   10.60

Observation :We can observe that the residual sugar have very little effect of quality.Quality 5 rating have maximum number of residual sugar.

Alcohol Vs Quality Rating

## wine_df$quality.rate: Low
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.00    9.40   10.10   10.17   10.80   13.50 
## -------------------------------------------------------- 
## wine_df$quality.rate: Medium
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.00    9.40   10.00   10.27   11.00   14.00 
## -------------------------------------------------------- 
## wine_df$quality.rate: High
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.50   10.70   11.50   11.42   12.40   14.20

Observation :We can observe that the good quality of wine have highest percentage of alcohol.alcohol also have positive relationship with quality

pH Vs Quality

## wine_df$quality: 3
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.870   3.035   3.215   3.188   3.325   3.550 
## -------------------------------------------------------- 
## wine_df$quality: 4
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.830   3.070   3.160   3.183   3.280   3.720 
## -------------------------------------------------------- 
## wine_df$quality: 5
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.790   3.080   3.160   3.169   3.240   3.790 
## -------------------------------------------------------- 
## wine_df$quality: 6
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.720   3.080   3.180   3.189   3.280   3.810 
## -------------------------------------------------------- 
## wine_df$quality: 7
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.840   3.100   3.200   3.214   3.320   3.820 
## -------------------------------------------------------- 
## wine_df$quality: 8
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.940   3.120   3.230   3.219   3.330   3.590 
## -------------------------------------------------------- 
## wine_df$quality: 9
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.200   3.280   3.280   3.308   3.370   3.410

Observation : As we seen in boxplot pH level move upward as increase quality of wine.In scatter plot pHlevel move slightly downwars in between quality 3~4 rating but after 5~6 rating it increase along with quality.

Boxplot of Sulphates,fixed.acidity,volatile.acidity,citric.acid,chlorides Vs Quality Rating,

Observation : As we observe above boxplot of Sulphates,Citric acid and Fixed acidity shows same distribution on different quality rating and it doesn’t show clear in boxplot.Volatile acidity and Chlorides show downward trend on boxplot also have many outliners.

Scatter plot for Free sulfur dioxide,Total sulfur dioxide,Density Vs Wine Quality

Observation : Higher free and total sulfur dioxide make low quality in wine and low density have higher quality of wine

Scatter plot between pH VS Alcohol.

## [1] "Correlation with pH and alcohol"
## 
##  Pearson's product-moment correlation
## 
## data:  alcohol and pH
## t = 8.5601, df = 4896, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.09374446 0.14893205
## sample estimates:
##       cor 
## 0.1214321

Observation : As we seen in graph both pH level and alcohol pecentage positive correlationship(0.1214321). In between 10~12% alcohol shows maximum pH level.

Scatter plot between Density VS Alcohol & Residual Sugar.

## [1] "Correlation with density and alcohol"
## 
##  Pearson's product-moment correlation
## 
## data:  alcohol and density
## t = -87.255, df = 4896, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.7908646 -0.7689315
## sample estimates:
##        cor 
## -0.7801376

## [1] "Correlation with density and residual sugar"
## 
##  Pearson's product-moment correlation
## 
## data:  residual.sugar and density
## t = 107.87, df = 4896, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8304732 0.8470698
## sample estimates:
##       cor 
## 0.8389665

Observation :As we seen in the graph higher percentage of alcohol have low density.both alcohol and density have negative correlationship.Both Density and residual sugar have strongest correlation and they both positively increase.

Scatter plot between Sulphates vs total & free sulfur dioxide.

## [1] "Correlation with sulphates and free.sulfur.dioxide"
## 
##  Pearson's product-moment correlation
## 
## data:  free.sulfur.dioxide and sulphates
## t = 4.1508, df = 4896, p-value = 3.369e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.03126264 0.08707928
## sample estimates:
##        cor 
## 0.05921725

## [1] "Correlation with sulphates and total.sulfur.dioxide"
## 
##  Pearson's product-moment correlation
## 
## data:  total.sulfur.dioxide and sulphates
## t = 9.5019, df = 4896, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1069590 0.1619585
## sample estimates:
##       cor 
## 0.1345624

Observation : As we seen in graph sulphates and free & total sulfur dioxide have weak positive correlationship.

Bivariate Analysis

Talk about some of the relationships you observed in this part of the . How did the feature(s) of interest vary with other features in dataset?

As we seen above citric.acid,free.sulfur.dioxide,pH,Sulphates & Alcohol have positive and effective coorelation with Quality.The results are that good wine also tend to have high free-sulfur-dioxide and high ratio of free & total sulfur-dioxide.

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

we observe good relationship between Sulphate and free sulfur dioxide. (cor 0.05921725) & total sulfur dioxide(cor 0.1345624).Alcohol and pH level have slight noraml relationship (cor 0.1214321 ).

What was the strongest relationship you found?

Density make Strongest relationship with residual sugar(cor 0.8389665) and total & free sulfur dioxide also have strong relatioship.

Multivariate Plots Section

By quality & quality rating we correlate with fixed acidity,citric acid,volatile acidity and alcohol

Observation :Fixed acidity and citric acid have a qood and positive correlation which is shown in quality 5 and 6 or we can say medium quality of wine.we can observe that high quality of wine have high alcohol and low volatile acidity in low vice versa.In medium quality have wine both alcohol and volatile acidity spreed every where.Quality have strong correlation with alcohol and alcohol have positive correlation with volatile.acidity.We can see the worse quality wines at low alcohol and high volatile acidity.

In wine quality of 5~6 we see high range of citric acid and alcohol have low level percentage and hig quality of wine have low citric acid.

By quality & quality rating we correlate with free and total sulfur dioxide,alcohol and sulphates

Observation :As we observe that total and free sulfer dioxide have good and positive correlation.Total and free sulfer dioxide good relation with medium quality of wine.In medium quality of wine have good correlation with alcohol and sulphates.

By quality & quality rating we correlate with free and total sulfur dioxide,density and residual sugar

Observation : As we observe that free and total sulfur dioxide have good correlation with density also have provide high quality of wine.In quality rating 567 we see that good positive relation and provide medium quality of wine.Residual sugar and density have strong relation of High quality of wine and also We see a increase of density with increase of residual sugar.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the . Were there features that strengthened each other in terms of at your feature(s) of interest?

We Observed following relationships during our investigation :
01.We find very interesting relation between alcohol,residual sugar,total & free sulfur dioxide,sulfate,density and quality.
02.As we observe alcohol,density,residual increase as quality of wine increase.
03.Total and Free sulfur dioxide have aslo positive relation with quality of wine.
04.In white wine dataset density and alcohol negatively correlated.Higher quality of wine have high alcohol percentage and low density level.
05.High quality of wine have high quality of sulphates for better quality of white wine.
06.We have seen how alcohol and volatile acidity relate with quality. Higher alcohol and lower acidity give in general better quality wines.

Were there any interesting or surprising interactions between features?

## [1] "Alcohol correlation with Quality"
## [1] 0.4355747
## [1] "Alcohol correlation with Other variables"
##                             [,1]
## fixed.acidity        -0.12088112
## volatile.acidity      0.06771794
## citric.acid          -0.07572873
## residual.sugar       -0.45063122
## chlorides            -0.36018871
## free.sulfur.dioxide  -0.25010394
## total.sulfur.dioxide -0.44889210
## density              -0.78013762
## pH                    0.12143210
## sulphates            -0.01743277
## alcohol               1.00000000
We found very interesting analysis that alcohol positively correlated with quality and pH & volatile acidity.Other Interseting analysis with alcohol and density, Both have negatively correlated.Higher quality of wine have high alcohol percentage and low density level.Residual sigar and density have very strong and positively correlated.

OPTIONAL: Did you create any models with your dataset? Discuss the and limitations of your model.

No,we didn’t create any models with our dataset.

Final Plots and Summary

Plot One

Description One

We observe that medium quality of wine have 5~6 quality of rating.In barchart we easily identify the color palette that easily understable the quality of wine Low to High.

Plot Two

Description Two

As we previously observe that highest residual sugar have higher density and when residual sugar go low density also low.

Plot Three

Description Three

we observe the correlation between density,residual sugar and alcohol by using quality rating in single plot.we cut off alcohol into following 7,9.5,10.4,11.4, 14.2 scale in order to see trends.
we see that high level wine quality have high level alcohol percentage is higher and density level and residual sugar is low.This graph shows the positive effect of quality of wine.medium quality wine have better relationship between residual sugar and density by alcohol.

Reflection

After Plotting & Analyising(univariatet,Bivariate & Multivariate) white wine dataset we come up the following conclusion:

01. We analyis how alcohol,desity and residual sugar have strongest correlationship.
02. The correlation between volatile acidity,pH & quality of wine with Alcohol surprise us.
03.Quality of wine divides into three category low,medium,and high and rating divides on 3 to 9.we analysis that quality 5 and 6 have medium quality have higher number of entry in dataset or we can say that quality of wine depends on taste of consumer but better wine rating is lies between 8~9.
04.Total and free sulfur dioxide shows correlation with sulphates and also together.
05.Alcohol level of wine decreases with the growth of residual sugar level.alcohol also plays key role to that investigation as we already obsevered with our scatterolots.
06.White wine data have no missing value and different types of variables to explorer.
07.All plotted boxplot have mostly normal distributed and most of scatterplot show excat relationship according to analysis.
we found a little struggle to analysis the outliners cause they have many of them but we do and overcome on it. For further investigation we will need more data and we do some statsical analysis and multiple varibles explorer by deep analyis of alcohol and quality of wine.as we categorise the alcohol percenatge than we do some extra analysis.

References

00. Udacity Diamond and facebook project. 01.https://s3.amazonaws.com/udacity-hosted-downloads/ud651/wineQualityInfo.txt
02.https://docs.google.com/document/d/e/2PACX-1vRmVtjQrgEPfE3VoiOrdeZ7vLPO_p3KRdb_o-z6E_YJ65tDOiXkwsDpLFKI3lUxbD6UlYtQHXvwiZKx/pub?embedded=true
03.http://r.789695.n4.nabble.com/Trellis-setting-xlim-or-ylim-by-data-range-in-whole-column-or-row-td795484.html
04.http://www.sthda.com/english/wiki/renaming-data-frame-columns-in-r
05.http://www.sthda.com/english/wiki/correlation-matrix-a-quick-start-guide-to-analyze-format-and-visualize-a-correlation-matrix-using-r-software
06.https://ggplot2.tidyverse.org/reference/geom_histogram.html
07.https://rstudio-pubs-static.s3.amazonaws.com/240657_5157ff98e8204c358b2118fa69162e18.html
08.https://www.winespectator.com/drvinny/show/id/How-Does-pH-Affect-Alcohol-in-Wine